I wrote this notebook as a simple training exercise to better understand feedforward neural networks. The naming conventions in this code match with Andrew Ng's free online course in Machine Learning on Coursera (highly recommended). This neural network has a single hidden layer.
Here's how the neural network is connected and equations for calculating the hypothesis, h_theta(x).
This neural network also implements backpropagation during training to determine the difference between the hypothesis and the training data in order to update the thetas, or weights, in the network.
The example has a trivial training set with X equal to
0 | 0 |
0 | 1 |
1 | 0 |
1 | 1 |
and the y vector used for this supervised learning matches the exclusive or (XOR) pattern.
0 |
1 |
1 |
0 |
Note: the images above are from Andrew Ng's Machine Learning Course.
In [1]:
# NumPy is the fundamental package for scientific computing with Python.
import numpy as np
The theta_init
function is used to initialize the thetas (weights) in the network. It returns a random matrix with values in the range of [-epsilon, epsilon].
In [2]:
def theta_init(in_size, out_size, epsilon = 0.12):
return np.random.rand(in_size + 1, out_size) * 2 * epsilon - epsilon
This network uses a sigmoid activating function. The sigmoid derivative is used during backpropagation.
In [3]:
def sigmoid(x):
return np.divide(1.0, (1.0 + np.exp(-x)))
def sigmoid_derivative(x):
return np.multiply(x, (1.0 - x))
The mean squared error (MSE) provides measure of the distance between the actual value and what is estimated by the neural network.
In [4]:
def mean_squared_error(X):
return np.power(X, 2).mean(axis=None)
The nn_train
function trains an artificial neural network with a single hidden layer. Each column in X is a feature and each row in X is a single training observation. The y value contains the classifications for each observation. For multi-classification problems, y will have more than one column. After training, this function returns the calculated theta values (weights) that can be used for predictions.
The training will end when the desired error or maximum iterations is reached whichever comes first.
In [5]:
def nn_train(X, y, desired_error = 0.001, max_iterations = 100000, hidden_nodes = 5):
m = X.shape[0]
input_nodes = X.shape[1]
output_nodes = y.shape[1]
a1 = np.insert(X, 0, 1, axis=1)
theta1 = theta_init(input_nodes, hidden_nodes)
theta2 = theta_init(hidden_nodes, output_nodes)
for x in range(0, max_iterations):
# Feedforward
a2 = np.insert(sigmoid(a1.dot(theta1)), 0, 1, axis=1)
a3 = sigmoid(a2.dot(theta2))
# Calculate error using backpropagation
a3_delta = np.subtract(y, a3)
mse = mean_squared_error(a3_delta)
if mse <= desired_error:
print "Achieved requested MSE %f at iteration %d" % (mse, x)
break
a2_error = a3_delta.dot(theta2.T)
a2_delta = np.multiply(a2_error, sigmoid_derivative(a2))
# Update thetas to reduce the error on the next iteration
theta2 += np.divide(a2.T.dot(a3_delta), m)
theta1 += np.delete(np.divide(a1.T.dot(a2_delta), m), 0, 1)
return (theta1, theta2)
The nn_predict
function takes the theta values calculated by nn_train
to make predictions about the data in X.
In [6]:
def nn_predict(X, theta1, theta2):
a2 = sigmoid(np.insert(X, 0, 1, axis=1).dot(theta1))
return sigmoid(np.insert(a2, 0, 1, axis=1).dot(theta2))
In [7]:
X = np.matrix('0 0; 0 1; 1 0; 1 1')
y = np.matrix('0; 1; 1; 0')
(theta1, theta2) = nn_train(X, y)
print "\nTrained weights for calculating the hidden layer from the input layer"
print theta1
print "\nTrained weights for calculating from the hidden layer to the output layer"
print theta2
Now that we've trained the neural network. We can make predictions for new data.
In [8]:
# Our test input doesn't match our training input 'X'
X_test = np.matrix('1 1; 0 1; 0 0; 1 0')
y_test = np.matrix('0; 1; 0; 1')
y_calc = nn_predict(X_test, theta1, theta2)
y_diff = np.subtract(y_test, y_calc)
print "The MSE for our test set is %f" % (mean_squared_error(y_diff))
print np.concatenate((y_test, y_calc, y_diff), axis=1)
Column one is the correct value, column two is the value predicted by this simple neural network, and the third column shows the difference. The neural network correctly learned the XOR pattern.